Generic 3D Representation via Pose Estimation and Matching supplementary material

نویسندگان

  • Amir R. Zamir
  • Tilman Wekel
  • Pulkit Agrawal
  • Colin Wei
  • Jitendra Malik
  • Silvio Savarese
چکیده

In the supplementary material, we provide the following items: 1) Experiments on joint (vs single) feature learning, 2) Experiments on Brown el. and and Mikolajczyk&Schmid Benchmarks, 3) More details about the dataset, including pixel alignment procedure, sample data, and coverage, 4) Evaluation of surface normal estimation on NYUv2 dataset, 5) Joint embedding of synthetic cubes and images, 6) Illustration of Pose Induction via tSNE Embeddings of more ImageNet Classes and MIT Places. 7) More details on the training procedure, 1 Joint Feature Learning We investigated different aspects of joint learning the representation and information sharing among the two supervised tasks in the tables 1 and 2. To quantify the amount of information shared among the matching and pose estimation tasks, we trained a single-task network dedicated to each problem; their error for their respective task is reported in “Direct” row of the table 1. “Transduction” provides the error rate when a linear classifier was trained on the frozen representation of one task to solve the other task. The fact that the Transduction setup achieves a reasonable performance suggests the two problems have a great deal of shared information in their representations. Table 2 compares the performance of single vs multi-task networks. The multi-task network performs comparable to its dedicated counterparts showing it encoded both problems with no performance drop. 2 Brown et al. Benchmark. We evaluated the performance of our method on the benchmarks of Brown et al. [3] and Mikolajczyk & Schmid [9] (next subsection). Compared to our dataset, 2 Zamir, Wekel, Agarwal, Wei, Malik, Savarese Table 1: Information Sharing Among Supervised Tasks Matching (FPR) Pose (Error) Direct 23.54% 16.58◦ Transduction 30.06% 24.50 ◦ Chance 95% 90 ◦ Table 2: Joint vs Individual Learning Matching (FPR) Pose (Error) Pose-Net 16.58◦ Matching-Net 23.54% Joint-Net 23.0% 17.78 ◦ these benchmarks mostly include narrower baselines (except for a subset of [9]), and therefore, do not pronounce wide baseline handling abilities and our method sees more training data than the baselines. However, they would reveal 1) if our representation was performing well only on streetview scenery, and 2) if wide baseline handling capability was achieved at the expense of lower performance on small baselines. We compared our results with six baselines, including Zagoruyko & Komodakis’s [15] and MatchNet [6], against their most similar descriptor dimensionality (512) and network architecture to ours. For this experiment, we mixed our training dataset with the corresponding training split of Brown’s (see table 3). Except for two splits from the Yosemite National Park that are substantially foliage covered, our representation outperforms the baselines. We speculate that could be the reason since our ConvNet is mostly agnostic with respect to foliage as trees are uninformative for matching and pose in streetview scenery. Table 3: Evaluations on Brown’s Benchmark [3]. FPR@95 (↓) is the metric. Train Test MatchNet Zagor. Simonyan Trzcinski Brown Root-SIFT Ours [6] siam [15] [11] [12] [3] [2] Yos ND 7.70 5.75 6.82 13.37 11.98 22.06 4.17 Yos Lib 13.02 13.45 14.58 21.03 18.27 29.65 11.66 Lib ND 4.75 4.33 7.22 14.15 N/A 22.06 1.47 ND Lib 8.84 8.77 12.42 18.05 16.85 29.65 7.39 Lib Yos 13.57 14.89 11.18 19.63 N/A 26.71 13.78 ND Yos 11.00 13.23 10.08 15.86 13.55 26.71 12.30 mean 9.81 10.07 10.38 17.01 15.16 26.14 8.46 Table 4: Evaluation on Mikolajczyk & Schmid’s [9]. The metric is mAP(↑). Transf. 1 2 3 4 5 Magnitude SIFT [8] 40.1 28.0 24.3 29.0 17.1 Zagor. [15] 43.2 37.5 29.2 28.0 16.8 Fischer et al [5] 42.3 33.9 26.1 22.1 14.6 Ours-rectified 46.4 41.3 29.5 23.7 17.9 Ours-unrectified 51.4 37.8 34.2 30.8 20.8 Mikolajczyk & Schmid Benchmark. The evaluation results on the benchmark of Mikolajczyk & Schmid [9] are provided in table 4 using the standard protocol [9,15]. Following [15,5], we performed the matching on MSER features. The last two rows show our results on the MSER patches with and without rectification (i.e., skipping MSER rectification). Our representation outperforms the baselines in both cases, while not performing the rectification actually improves the performance. 3 Dataset and Data Collection Details Our dataset was collected from vast geographical areas spanning multiple cities. Some representative city locations from which data was collected are shown in figure 1. Generic 3D Representation supplementary material 3 Paris Paris Florence San Francisco

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic 3D Representation via Pose Estimation and Matching

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the p...

متن کامل

Harvesting Multiple Views for Marker-less 3D Human Pose Annotations Supplementary Material

In this supplementary, we provide material that could not be included in the main manuscript due to space constraints. Section 1 provides additional quantitative evaluation of our approach for multi-view pose estimation, and comparison with the state-of-the-art for HumanEva-I [4]. Section 2 provides full results of the multi-view optimization on Human3.6M after refining the generic 2D pose Conv...

متن کامل

A 3D image processing method for manufacturing process automation

Three-dimensional (3D) image processing provides a useful tool for machine vision applications. Typically a 3D vision system is divided into data acquisition, low-level processing, object representation and matching. In this paper, a 3D object pose estimation method is developed for an automated manufacturing assembly process. The experimental results show that the 3D pose estimation method pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016